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ABSTRACT 

Background The majority of coeliac disease (CD) 
patients are not being properly diagnosed and therefore 
remain untreated, leading to a greater risk of developing 
CD-associated complications. The major genetic risk 
heterodimer, HLA-DQ2 and DQ8, is already used 
clinically to help exclude disease. However, 
approximately 40% of the population carry these alleles 
and the majority never develop CD. 
Objective We explored whether CD risk prediction can 
be improved by adding non-HLA-susceptible variants to 
common HLA testing. 

Design We developed an average weighted genetic risk 
score with 10, 26 and 57 single nucleotide 
polymorphisms (SNP) in 2675 cases and 2815 controls 
and assessed the improvement in risk prediction provided 
by the non-HLA SNP. Moreover, we assessed the 
transferability of the genetic risk model with 26 non-HLA 
variants to a nested case-control population (n=1 709) 
and a prospective cohort (n=1 245) and then tested how 
well this model predicted CD outcome for 985 
independent individuals. 

Results Adding 57 non-HLA variants to HLA testing 
showed a statistically significant improvement compared 
to scores from models based on HLA only, HLA plus 10 
SNP and HLA plus 26 SNP. With 57 non-HLA variants, 
the area under the receiver operator characteristic curve 
reached 0.854 compared to 0.823 for HLA only, and 
1 1.1% of individuals were reclassified to a more 
accurate risk group. We show that the risk model with 
HLA plus 26 SNP is useful in independent populations. 
Conclusions Predicting risk with 57 additional 
non-HLA variants improved the identification of potential 
CD patients. This demonstrates a possible role for 
combined HLA and non-HLA genetic testing in 
diagnostic work for CD. 



INTRODUCTION 

Coeliac disease (CD) is a chronic immune-mediated 
enteropathy triggered by exposure to dietary gluten 
in genetically predisposed individuals. 1 Screening 
studies have revealed increased occurrence in some 
countries, with a prevalence ranging from 0.3% to 
3%, always with the majority of cases being previ- 
ously undiagnosed. 2-7 Age at onset ranges from 



Significance of this study 



What is already known on this subject? 

► HLA-DQ2 and DQ8 provide the highest genetic 
risk for CD. However, these genes are present 
in about 40% of the population, and only a 
subset will develop disease. Therefore, 
screening for HLA-DQ2 and DQ8 alleles is 
helpful only to identify those at extremely low 
risk for CD. 

► Current recommendations are to perform 
periodic screening of certain high-risk groups 
for CD, such as first-degree relatives and those 
with type 1 diabetes. However, the degree of 
risk is not uniform among all of these groups. 

► Current methods of genetic testing are 
inadequate at effectively identifying individuals 
from the general population at significantly 
greater risk for CD who may require periodic 
serological screening for CD. 

What are the new findings? 

► Increases in the number of variants associated 
with CD have helped refine and improve the 
genetic risk model. 

► Using HLA variants, 57 non-HLA variants, 
gender and population origin have improved 
the discriminatory power with the AUC of the 
ROC curve reaching 84%. 

► Combining HLA and 57 non-HLA variants 
improved the classification of 11% of 
individuals to more accurate categories. 



infancy to late adulthood, and clinical presentation 
can be highly variable, from impaired growth, diar- 
rhoea and abdominal pain to presentations such as 
iron-deficiency, anaemia and decreased bone 
density. 8-10 Family members of CD patients and 
those with another immune-mediated disease are at 
higher risk of developing CD. As symptoms of CD 
can be subtle or insidious, current recommenda- 
tions are to screen such at-risk groups with periodic 
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Significance of this study 



How might it impact on clinical practice in the 
foreseeable future? 

► Although we only screen individuals with a 'known' risk for 
CD (because they belong to an 'at-risk' group, the majority 
of cases of CD comes from individuals who have permissive 
HLA in the general population. The ability to identify an 
individual at 'extreme' risk for CD could make the current 
serological screening strategy more effective by personalising 
the approach in the general population. This is a first step 
towards the application of genetic testing for CD in the 
clinical setting and/or on a population level. 

► Genetic testing for CD may assist in the early detection of 
individuals at risk of CD, ie, those with a first-degree relative 
with CD and those with autoimmune diseases showing 
comorbidity with CD. 



serological testing. The serological antibodies used as markers 
for CD have a relatively high sensitivity and specificity, and in 
most cases a small bowel biopsy revealing enteropathy is neces- 
sary to confirm the diagnosis of villous atrophy. 11 

The importance of genetic testing was highlighted in the 
revised guidelines for the diagnosis of CD recently proposed by 
the European Society of Paediatric Gastroenterology, 
Hepatology and Nutrition. 12 They recommend typing for 
HLA-DQ2 and HLA-DQ8 in symptomatic children with high 
clinical suspicion of CD, but without confirmatory biopsy, to 
add strength to the diagnosis, as well as to those with an uncer- 
tain diagnosis of CD or to those belonging to risk groups, to 
exclude further testing for CD. These HLA heterodimers are 
known to be the major genetic risk factors for CD and have a 
negative predictive value of almost 100%, but the positive pre- 
dictive value is poor, as approximately 40% of the population 
carry one or both of these alleles. 13 In the past few years, two 
genome-wide association studies (GWAS) and one fine-mapping 
project have identified up to 57 non-HLA single-nucleotide 
polymorphisms (SNP) that contribute to CD susceptibility. 14-17 
To date, approximately 54% of the genetics of CD can be 
explained by HLA plus the 57 non-HLA SNP compared to 40% 



by HLA alone. 18 In 2009, we published a genetic risk model for 
CD using HLA and the 10 non-HLA risk variants resulting 
from the first GWAS. 19 We showed that by using this model, the 
identification of individuals at high risk of developing CD could 
be markedly improved. Now, with many more associated loci 
known for CD, our aim was to test if the genetic risk model 
could be improved by adding the new variants, assess how well 
it transfers to other cohorts, and evaluate how well it can be 
used in clinical practice. 



MATERIALS AND METHODS 
Study populations 

Our study included four groups (table 1): (1) a discovery set of 
2675 CD cases and 2822 healthy controls in which we calcu- 
lated the OR for each SNP after having identified the mode of 
inheritance; (2) a derivation set of 2675 cases and 2815 controls 
in which we created the risk model; (3) two sets for validating 
the risk model, which included a 1709 nested case-control 
population (validation set 1), and a prospective cohort of 1244 
individuals (validation set 2); and (4) a test set of 985 independ- 
ent individuals on whom we applied the risk model. 

The discovery and derivation case-control samples were pre- 
viously included in our CD meta-analysis and incorporated 
cohorts from The Netherlands, Italy, Poland, Spain and the 
UK. 17 To prevent over-fitting of the model, we randomly 
selected 50% of the cases and controls to form a discovery 
dataset in which we calculated the OR, while the other half 
became the derivation set to create the risk model (table 1). The 
samples were evenly distributed across the different populations, 
except for the UK cohort, from which we randomly selected 
700 cases and 1000 controls to obtain sample sizes equal to the 
other populations. 

The first validation set included cases and matched controls 
from a Swedish cross-sectional CD screening of 12-year-old chil- 
dren. Most of these children were born in 1993 or 1997. 5 20 21 
Together the two cohorts contain 306 CD patients for whom 
DNAwas available. Gender-matched controls (1403 individuals) 
were randomly selected among those with normal levels of CD 
markers belonging to the corresponding cohort. As there was no 
difference between the frequencies of SNP in the two cohorts, 
we treated the 1993 and 1997 cohorts as one collection in our 
analysis. 



Table 1 The different datasets included in this study: a discovery set for single SNP OR calculation, a derivation set to create the risk models, 
two validation sets to validate the risk model, and a test set to evaluate the model in clinical practice 



Discovery set: Derivation set: Validation set 1: Validation set 2: Test set: 

case-control case-control nested case-control prospective case-control 



Cohorts 


Cases 


Controls 


Cases 


Controls 


Cases 


Controls 


CDA No CDA 


Cases 


Controls 


Italy 


695 


635 


693 


635 








99 


219 


The Netherlands 


535 


586 


535 


583 








61 


175 


Poland 


235 


270 


236 


269 








50 


67 


Spain 1 


242 


171 


242 


170 








34 


122 


Spain 2 


268 


160 


269 


159 








33 


125 


UK 


700 


1000 


700 


999 












Sweden 










306 


1403 








Non-Hispanic white American 














70 1174 






Sub-total 


2675 


2822 


2675 


2815 


306 


1403 


70 1174 


277 


708 


Total 




5497 




5490 




1709 


1244 




985 



CDA, coeliac disease autoimmunity; SNP, single-nucleotide polymorphism. 
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The second validation set included 1244 non-Hispanic, white 
American children from a prospective population-based cohort 
from Denver, Colorado, USA; they are being followed from 
birth for the development of transglutaminase auto-antibodies 
and CD (the DAISY study). 22 

The test set included 985 parents of high-risk CD children 
(those with a first-degree relative with CD) from The 
Netherlands, Italy, Poland and Spain, which were collected as 
part of the PreventCD project. 21 

Each dataset was collected for different purposes by different 
investigators and are independent of each other. All subjects had 
self-reported Caucasian ancestry and have been described else- 
where. 17 20-22 CD patients in the discovery, derivation and test 
sets had a biopsy-confirmed diagnosis. In validation 1 set, CD 
diagnosis required villous atrophy or intraepithelial lymphocyto- 
sis in combination with the presence of HLA-DQ2 or 
HLA-DQ8, as well as symptoms or signs supporting the diagno- 
sis. In validation 2 set, CD was defined as having a very high 
and persistent level of transglutaminase auto-antibodies or con- 
firmed by biopsy, so we refer to this group as having CD 
autoimmunity. 23 

Genotyping 

Individuals homozygous for HLA-DQ2.5 or HLA-DQ2.5/ 
DQ2.2 genotypes have an increased CD risk compared to those 
homozygous for HLA-DQ2.2 or DQ8, or heterozygous for 
HLA-DQ2.5, DQ2.2 or DQ8, while individuals with no-DQ2/ 
DQ8 have practically no risk for CD. 19 24-26 To predict whether 
an individual has 0, 1 or 2 HLA-DQ2 and/or DQ8 alleles, we 
genotyped six tagging SNP 27 We then categorised the indivi- 
duals into three risk groups: low-risk (coded 0) if they were 
HLA-DQ2/DQ8 negative (ie, neither HLA-DQ2.5, DQ2.2 nor 
DQ8), high-risk (coded 2) for those homozygous for 
HLA-DQ2.5 or HLA-DQ2.5/DQ2.2, and intermediate risk 
(coded 1) for all other combinations. 19 

To assess if the new susceptibility variants improve risk pre- 
diction, we compared three genetic risk scores (GRS) calculated 
using: (1) 10 non-HLA SNP from the first GWAS and its 
follow-up; 14 15 (2) 26 non-HLA SNP from the second GWAS; 16 
and (3) 57 non-HLA SNP from the fine-mapping project 17 (see 
supplementary table 1, available online only). All these SNP 
were reported at genome-wide significance (p<5xl0 -8 ) in each 
study. 

For the discovery and derivation sets, genotype data were 
acquired as part of our fine-mapping project using Immunochip, 
a custom-made platform from Illumina. 28 A stringent quality 
control check was performed on these samples. 17 Samples in 
validation sets 1 and 2, and in the test sets were genotyped on 
Illumina 48-plex VeraCode technology for the 26 SNP identified 
in the second GWAS only and the six HLA tagging SNIJ follow- 
ing Illumina's protocol. Genotyping data analysis and clustering 
was performed in GenomeStudio. Genotype clusters were 
manually investigated and adjusted if necessary. All plates 
included one duplicate sample and one positive control. One 
SNI^ corresponding to IL18RAP locus (imm_2_l 024298 01), 
was not present on VeraCode, so we used a perfect proxy 
(rs917997, r 2 = l, D' = l) (see supplementary table 1, available 
online only). 

Statistical analysis 

Using the derivation cohort, we coded each SNP genotype as 0 
for the non-risk homozygous, 1 for the heterozygous, and 2 for 
the homozygous risk, then determined the type of inheritance 
mode by analysing the genotypes as categorical variables in 



logistic regression and adjusting for HLA group, gender and 
population origin. Comparing the Akaike information criterion 
(AIC) from each model, we saw no major differences between 
the inheritance models and therefore used the log-additive 
model, which was the best-fit model for most SNP 

In order to account for a difference in risk contribution from 
each SNI? we used a weighted method and calculated an average 
GRS for each individual. First, we multiplied the p-coefficients 
in supplementary table 1 (available online only) by the number 
of risk alleles (0, 1, 2) for each SNP per individual, took the 
sum across 10, 26 or 57 non-HLA SNI? and then divided the 
total by the number of alleles included in the model to obtain 
an average weighted GRS per allele. Only individuals with a 
defined HLA genotype and with more than 95% of genotypes 
available were included in the analysis. We used an averaged 
GRS per allele in order to be able to compare GRS from differ- 
ent datasets with different numbers of SNP that passed the 
quality control. Then, the GRS were categorised in quintiles of 
the control population. The controls in validation set 1 were 
healthy individuals who had a negative screening result for CD; 
we used both cases and controls to calculate the quintiles. For 
validation set 2, we had genotype data from 986 non-Hispanic 
white American individuals from the general population, which 
we used to calculate the quintiles. In each validation set, we esti- 
mated the risk for each category of the GRS in a logistic regres- 
sion using the third quintile (p40-p60) as a reference group 
adjusting for HLA group, gender and population origin. 

To evaluate the overall discrimination of our genetic model, 
we calculated the area under the receiver operator characteristic 
(ROC) area under the curve (AUC) for HLA only and combin- 
ing HLA and the GRS. We also calculated the net reclassification 
improvement (NRI) and the integrated discrimination improve- 
ment (IDI). A two-tailed p value less than 0.05 indicated statis- 
tical significance. All analyses were performed using PLINK 
vl.07, the R package PredictABEL, and SPSS V16.0. 29 30 

RESULTS 

Figure 1 shows the distribution of HLA and the three GRS in 
the large derivation set of 2675 CD cases and 2815 controls. 
The mean in cases is shifted towards a higher GRS in all three 
models compared to the mean in controls, showing a clear sep- 
aration of distribution between the two groups. We divided par- 
ticipants into five categories defined as quintiles of the control 
populations to make it easier to interpret the results of an 
average weighted GRS (the third quintile was considered the ref- 
erence category). The OR increases with increasing risk score 
for all three GRS models (see supplementary figure 1, available 
online only). The GRS_57 performs better than GRS_26 and 
GRS_10 mainly in the top quintile (p80-pl00). Individuals in 
the top quintile of GRS_57 had a 2.5 times higher risk (95% CI 
2.1 to 3.0) than those with a mean GRS, and a 7.2 times higher 
risk (95% CI 5.7 to 9.2) than those in the bottom quintile. 

Figure 2 shows the ROC curves for HLA only, HLA plus 
GRS_10, HLA plus GRS_26 and HLA plus GRS_57. The AUC 
estimates were improved with an increasing number of suscepti- 
bility variants used in the model. Combining HLA with 57 
non-HLA SNP showed the best discrimination, with an AUC 
reaching 0.854. The improvement between the HLA-only 
model and the models with HLA plus GRS was statistically 
highly significant (p = 0.0001). 

To confirm that adding non-HLA risk variants improved risk 
prediction, we tested the ability of the combined HLA and GRS 
models to reclassify individuals into predefined risk groups 
based on HLA testing only. The individuals could be grouped 
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HLA 




Controls 
■ Cases 



Intermediate 
HLA group 

GRS 26 



□ Controls 




0.02 0.04 0.06 0.08 0.1 0.11 0.13 0.15 0.17 0.19 



average risk score 



12 



10 



4 - 



0 J 



□ 


Controls 


■ 


Cases 



12 



10 




□ 


Controls 


■ 


Cases 




0.04 0.04 0.05 0.06 0.07 0.08 0.08 0.09 0.1 0.1 

average risk score 



0.05 0.05 0.06 0.06 0.06 0.07 0.07 0.08 0.08 0.08 0.C 

average risk score 



Figure 1 Distribution of HLA group and average risk scores of the genetic risk score (GRS)_10, GRS_26 and GRS_57 models in 2675 cases and 
2815 controls. GRS_10, GRS_26 and GRS_57 show a clear separation of distribution between cases and controls with the mean (SD) in cases (0.103 
(0.020), 0.071 (0.009), 0.069 (0.006), respectively) being statistically different to the mean (SD) in controls (0.095 (0.020), 0.067 (0.009), 0.066 
(0.006), respectively) (p=2.71x10~ 45 , 3.41x10 , 3.2x10~ 111 , respectively (independent sample two-tailed t test)). 



into three categories: low (predicted risk <25%), intermediate 
(25-75%) and high-risk (>75%), thus we used the same 
cut-offs to classify individuals using the models with HLA plus 
GRS (figure 3). Among the 1590 cases that have intermediate 
risk based on their HLA only (derivation set), 241 (15.1%) indi- 
viduals were moved into the high-risk category (>75%) when 
their GRS with 57 variants was added (table 2). Similarly, 25 
(18.2%) of the 137 controls first classified as high risk (>75%) 
were moved to the intermediate-risk category and 212 of 1373 
intermediate-risk controls (15.4%) were moved to the low-risk 
category (<25%). NRI and IDI were statistically significant for 
all models. Even when we used 20% and 80%, or 30% and 
70% as cut-offs, the NRI and IDI were still significant. The 
model with 57 SNP performed best by reclassifying 11.1% of 
the individuals into a more accurate risk group, while GRS_26 
reclassified 7.1% and GRS_10 reclassified 4.1%. 

To assess if such a genetic risk model is applicable to other 
populations, we tested the GRS with 26 SNP in two nested 
case-control studies from Sweden (validation set 1) and in a 



prospective cohort from the USA (validation set 2), both of 
which had not been assessed in previous gene discoveries. 

In the Swedish study, the mean of GRS_26 in controls of 
0.068 (SD 0.0099) was statistically different from the mean of 
cases (0.071, SD 0.0097) (independent sample two-tailed t 
test=1.28xl0 -5 ). Based on HLA genotypes, we first categorised 
the individuals into three groups and identified only one CD 
case in the low-risk group (no HLA-DQ2/DQ8), indicating the 
high negative predictive value of HLA typing to exclude CD 
risk. We further focused our test on those individuals positive 
for DQ2 and/or DQ8 (n=1035). The predicted risk based on 
HLA only ranged from 23.57% to 27.74% for the intermediate 
HLA group, and 60.95% to 66.02% for the high-risk HLA 
group. Using the lowest ranges as a cut-off for reclassification, 
31% (215/695) of the controls in the intermediate group (23- 
60%) were moved to the low-risk group (<23%). The NRI of 
HLA-only versus HLA plus GRS_26 was 0.116 (95% CI 0.051 
to 0.180; p=0.00042), while IDI was 0.013 (95% CI 0.006 to 
0.020; p = 0.0004) (data not shown). 
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ROC plot 




only HLA (AUC=0.823) 
HLA+GRS_10 (AUC=0.837) 
HLA+GRS_26 (AUC=0.843) 
HLA+GRS_57 (AUC=0.854) 



1 - Specificity 

Figure 2 Receiver operator characteristic (ROC) curves and area under 
the curve (AUC) for the HLA-only model (AUC=0.823; 95% CI 0.812 to 
0.834), and combined HLA plus GRSJO (AUC=0.837; 95% CI 0.827 to 
0.848), HLA plus GRS_26 (AUC=0.843; 95% CI 0.832 to 0.853) and 
HLA plus GRS_57 (AUC=0.854; 95% CI 0.844 to 0.864) models. 



In the prospective cohort (validation set 2), we categorised 
individuals based on quintiles calculated from a general popula- 
tion cohort and used the lowest quintile (p0-p20) as a reference 
group. Based on HLA, there were no CD autoimmunity cases in 
the lowest group and our analysis continued with 1116 indivi- 
duals who were DQ2 and/or DQ8 positive. Using the Cox pro- 
portional hazard model adjusted for gender, recruitment group 
and HLA, we observed an increase in HR with increasing risk 
score category (see supplementary figure 2, available online 
only). Although this was not statistically significant, it showed a 
trend of association, with the top group having a HR of 1.8 
(95% CI 0.81 to 3.98) compared to individuals in the lowest 
quintile. 

To test how well this risk profiling can be used in clinical prac- 
tice, we calculated a predicted risk for 985 independent indivi- 
duals (test set) before unravelling their status using the OR 
calculated in validation set 1 (see supplementary table 2, available 
online only). We then grouped the individuals into the risk cat- 
egories defined earlier. After checking the CD status of indivi- 
duals, we compared their classification from using only HLA in 
the model to using HLA plus GRS_26. Combining HLA and 26 
non-HLA variants in the model led to 14.6% of the individuals 
being reclassified into more appropriate categories (table 3). 

DISCUSSION 

We demonstrate that combining HLA and non-HLA variants 
increases the diagnostic accuracy of genetic testing for CD. 
Previously, we showed better classification with a simple count 
model of 10 non-HLA variants. 19 Now we have further devel- 
oped this model by including up to 57 non-HLA SNP and com- 
paring four genetic risk models for CD including gender and 
population origin. We used a weighted GRS to account for the 
differences in OR of each allele. All three GRS were associated 
with CD in our case-control derivation set, with individuals in 



the top quintile having 1.68, 2.00 and 2.50 times higher risk of 
CD compared to those in the middle quintile. Individuals in the 
bottom quintiles had 0.54, 0.44 and 0.45 times less risk of 
developing CD than someone with a mean GRS from the 
general population. 

Adding non-HLA variants to the HLA prediction improved 
not only the discriminatory power as assessed by the ROC 
curves, but also the reclassification of individuals into more 
accurate risk categories with the increase in NRI and IDI. 
Compared to other genetically complex diseases such as mul- 
tiple sclerosis and type 2 diabetes, in which AUC only reached 
0.769 and 0.74, respectively, our GRS in CD performs 
well. 31 32 Our best AUC reached 0.854 for the GRS_57 model. 
This is in the same range as the Framingham risk score for cor- 
onary heart disease (AUC~0.8), which is clinically useful. 33 
Moreover, our risk model appears to be applicable to clinical 
practice and transferable to other populations, being specifically 
useful in individuals positive for HLA-DQ2 and/or DQ8. 

The ability to identify subgroups of those at 'extreme' risk or 
lower risk for CD will enable more accurate classifications of 
research subjects in clinical trials. For example, PreventCD is an 
ongoing intervention study that will evaluate whether the con- 
trolled introduction of small quantities of gluten between the 
age of 4 and 6 months can prevent the occurrence of CD in 
children carrying HLA-DQ2 and/or DQ8. However, many chil- 
dren in the study will never develop CD, as they do not carry 
the other risk factors required. This means that larger numbers 
of individuals are needed to test the potential treatment 
adequately. 21 The enhanced risk modelling will help classify 
individuals into higher and lower risk groups more accurately, 
by using both HLA and non-HLA genetic signatures, thereby 
permitting a more efficient study design and analysis in the 
future. 

From a clinical perspective, there are several at-risk groups 
of individuals who will require periodic serological screening 
for CD throughout their lifetime. It has been argued, although 
not universally recommended, that HLA testing could be done 
first to identify carriers of HLA-DQ2 and/or DQ8 and then to 
perform repeated serological testing only in those individuals 
in the future (although the risk of developing CD is not equal 
for HLA-DQ2 and HLA-DQ8 carriers). From a cost perspec- 
tive, this might be an efficient strategy as genotyping is rela- 
tively cheap and only needs to be done once, whereas 
serological testing is more expensive and needs to be repeated 
frequently. Excluding individuals who do not carry the genetic 
risk for developing CD from serological testing would reduce 
the cost and burden of repeated invasive testing. The age at 
which serological screening in an at-risk child should begin, 
how frequently to test, and when to perform intestinal biopsy 
are all issues that are still under discussion. The added value of 
non-HLA genetic factors is that they may allow us to stratify 
the population better into those in need of repeated serology 
screening, as HLA testing alone would still include some 30% 
of the population. Using only the presence or absence of HLA 
as a screening tool to help in the diagnosis of CD has a positive 
predictive value of 94%, but a sensitivity of 35%. However, by 
using our model, which combines different HLA risk variants 
with non-HLA risk variants, to classify individuals into a high- 
risk group decreases the positive predictive value to 57%, but 
increases the sensitivity to 63%. Thus, including non-HLA risk 
factors suggests that we can reclassify 14.6% of the population 
into more accurate risk categories, which might help to make a 
better selection of those who need closer follow-up and repeti- 
tive antibody testing. 
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Figure 3 Plot of predicted risk using HLA-only model versus HLA and genetic risk score (GRS) models showing how individuals can be shifted 
from one risk group to another. The GRS_57 model shows the largest number of individuals who were reclassified. All models were adjusted for 
gender and five-population origin. The black vertical line defines the three groups based on HLA (low <25%, intermediate 25-75%, high >75%), 
while the blue dashed line is the 25% predicted risk and the red dashed line is the 75% predicted risk based on HLA plus non-H LA variants. 



Table 2 Reclassification table of individuals of predicted risk using HLA-only versus combined HLA and GRS_10, GRS_26 and GRS_57 (low risk 
<25%, intermediate risk 25-75%, high risk >75%) 



HLA only 


HLA and GRSJO 






HLA and GRS_26 






HLA and GRS_57 






<25% 


25-75% 


>75% 


Reclassified% 


<25% 


25-75% 


>75% 


Reclassified% 


<25% 


25-75% 


>75% 


Reclassified% 


<25% 


























Total 


1419 


0 


0 


0 


1419 


0 


0 


0 


1419 


0 


0 


0 


Cases 


114 


0 


0 


0 


114 


0 


0 


0 


114 


0 


0 


0 


Controls 


1305 


0 


0 


0 


1305 


0 


0 


0 


1305 


0 


0 


0 


25-75% 


























Total 


64 


2710 


189 


0.09 


104 


2562 


297 


0.14 


261 


2389 


313 


0.19 


Cases 


12 


1444 


134 


0.09 


16 


1354 


220 


0.15 


49 


1300 


241 


0.18 


Controls 


52 


1266 


55 


0.08 


88 


1208 


77 


0.12 


212 


1089 


72 


0.21 


>75% 


























Total 


0 


39 


1069 


0.04 


0 


81 


1027 


0.07 


0 


77 


1031 


0.07 


Cases 


0 


24 


947 


0.02 


0 


52 


919 


0.05 


0 


52 


919 


0.05 


Controls 


0 


15 


122 


0.11 


0 


29 


108 


0.21 


0 


25 


112 


0.18 


NRI (95% CI) 


0.041 (0.029 to 0.053); p=0.0001 




0.071 (0.055 to 0.087); p=0.0001 




0.111 (0.093-0.129); p=0.0001 




IDI (95% CI) 


0.021 (0.018 to 0.025); p=0.0001 




0.031 (0.027 to 0.036); p=0.0001 




0.054 (0.048-0.060); p=0.0001 





GRS, genetic risk score; IDI, integrated discrimination improvement; NRI, net reclassification index. 
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Table 3 Reclassification table for HLA-only versus combined HLA 
and GRS_26 in the test set of 985 individuals 

HLA and GRS_26 

HLA only <25% 25-75% >75% Reclassified% 

<25% 



Total 


243 


0 


0 


0 


Cases 


17 


0 


0 


0 


Controls 


226 


0 


0 


0 


25-75% 










Total 


9 


477 


78 


0.15 


Cases 


0 


102 


48 


0.32 


Controls 


9 


375 


30 


0.09 


>75% 










Total 


0 


5 


173 


0.03 


Cases 


0 


1 


109 


0.01 


Controls 


0 


4 


64 


0.06 



NRI (95% CI) 0.146 (0.093 to 0.199); p=0.0001 

IDI (95% CI) 0.025 (0.014 to 0.037); p=0.0001 

GRS, genetic risk score; IDI, integrated discrimination improvement; NRI, net 
reclassification index. 
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